This R notebook is a rapid analysi of the cohort data produced as part of the data synthesis of citizen science projects
## Loading required package: ggplot2
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
Lets load in the data, it is currently seperate sheets in an excel file. Lets pull out the seperate sheets and assign to data.frames
summary <- read_excel(path = '../data/BES citsci data summary v3.xlsx', sheet = 1)
metadata <- read_excel(path = '../data/BES citsci data summary v3.xlsx', sheet = 2)
retention_tasks <- read_excel(path = '../data/BES citsci data summary v3.xlsx', sheet = 3)
retention_time <- read_excel(path = '../data/BES citsci data summary v3.xlsx', sheet = 4)
ppt_inequality <- read_excel(path = '../data/BES citsci data summary v3.xlsx', sheet = 5)
I think I need to spread the cohort data so that time is represented in columns
ret <- pivot_wider(data = retention_time, names_from = 'Session', values_from = 'NumberOfPeople')
head(ret)
# easy to view but not he format for ggplot
retention_time$CohortSession <- (retention_time$Session + 1) - retention_time$Cohort
retention_time$ProjectCohort <- paste(retention_time$ProjectCode,
retention_time$Cohort,
sep = '_')
That works. So now can we do an initial visualisation of the data
p <- ggplot(retention_time, aes(x = CohortSession, y = NumberOfPeople, group = ProjectCohort)) +
geom_line(aes(colour = ProjectCode))
ggplotly(p)
## Warning: `group_by_()` is deprecated as of dplyr 0.7.0.
## Please use `group_by()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
Okay now we need to average across cohorts in the same project and rescale so they all start with the same value
# Rescale
rescale <- function(x){
(x/max(x)) * 100
}
retention_time$NumberOfPeopleRescaled <- NA
for(i in unique(retention_time$ProjectCohort)){
retention_time$NumberOfPeopleRescaled[retention_time$ProjectCohort == i] <-
rescale(retention_time$NumberOfPeople[retention_time$ProjectCohort == i])
}
p <- ggplot(retention_time, aes(x = CohortSession, y = NumberOfPeopleRescaled, group = ProjectCohort)) +
geom_line(aes(colour = ProjectCode))
ggplotly(p)
And now average across projects
# group and average
av_ret <- tapply(retention_time$NumberOfPeopleRescaled,
INDEX = list(retention_time$ProjectCode,
retention_time$CohortSession),
FUN = mean)
# Put into long format
av_ret <- melt(av_ret, varnames = c('ProjectCode', 'CohortSession'), value.name = 'NumberOfPeopleRescaled')
av_ret
# plot
p <- ggplot(av_ret, aes(x = CohortSession, y = NumberOfPeopleRescaled,
group = ProjectCode)) +
geom_line(aes(colour = ProjectCode))
ggplotly(p)
This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.
library(plotly)
p <- plot_ly(economics, x = ~date, y = ~unemploy / pop)
p
## No trace type specified:
## Based on info supplied, a 'scatter' trace seems appropriate.
## Read more about this trace type -> https://plot.ly/r/reference/#scatter
## No scatter mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).
The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.